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Abstract 

Background: The cholera outbreaks in Thailand during 2007-2010 were exclusively caused by the Vibrio cholerae 01 El Tor 
variant carrying the cholera toxin gene of the classical biotype. We previously isolated a V. cholerae 01 El Tor strain from a 
patient with diarrhea and designated it MS6. IVlultilocus sequence-typing analysis revealed that MS6 is most closely related 
to the U. S. Gulf Coast clone with the exception of two novel housekeeping genes. 

Methodology/Principal Findings:lhe nucleotide sequence of the genome of MS6 was determined and compared with 
those of 26 V. cholerae strains isolated from clinical and environmental sources worldwide. We show here that the I\/1S6 
isolate is distantly related to the ongoing seventh pandemic V. cholerae 01 El Tor strains. These strains differ with respect to 
polymorphisms in housekeeping genes, seventh pandemic group-specific markers, CTX phages, two genes encoding 
predicted transmembrane proteins, the presence of metV {IV1S6_A0927) or hchA/luxR in a highly conserved region of the V. 
cholerae 01 serogroup, and a superintegron (SI). We found that V. cholerae species carry either hchA/luxR or met/ and that 
the V. cholerae 01 clade commonly possesses hchA/luxR, except for MS6 and U. S. Gulf Coast strains. These findings 
illuminate the evolutionary relationships among V. cholerae 01 strains. IVloreover, the IV1S6 SI carries a quinolone-resistance 
gene cassette, which was closely related with those present in plasmid-borne integrons of other gram-negative bacteria. 

Conclusions/Significance: Phylogenetic analysis reveals that I\/1S6 is most closely related to a U. S. Gulf Coast clone, 
indicating their divergence before that of the El Tor biotype strains from a common V. cholerae 01 ancestor. We propose 
that IVlSe serves as an environmental aquatic reservoir of V. cholerae 01. 
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Introduction 

Vibrio chokrae, which is present in aquatic environments 
worldwide, is a facultatively anaerobic, asporogenous, motile, 
curved, or straight gram-negative rod. There are more than 200 
serogroups of V. cholerae, but only serogroups Ol and 0139 cause 
epidemics and pandemics of cholera in human populations [1], 
and cholera toxin causes the major clinical signs of the disease. 
The Ol serogroup is classified into classical or El Tor biotypes. 
The sbcth cholera pandemic (1899-1923) was caused by the 
classical biotype, and the ongoing seventh cholera pandemic is 
caused by El Tor. Several other outbreaks of cholera occurred 
between the sixth and seventh pandemics, and some El Tor strains 
were isolated and are designated pre-seventh pandemic El Tor. 



During the past two decades, atypical V. cholerae Ol El Tor was 
isolated more frequently and was spread widely [2-6]. These 
isolates produce a cholera toxin that is distinct from that expressed 
by El Tor. 

We isolated a strain from a clinical specimen that we designated 
MSB that expresses the typical El Tor cholera toxin (genotype 3) 
[7]. Characterization of MSG using ribotyping, pulsed-field gel 
electrophoresis, multiple-locus variable-number tandem-repeat 
analysis, and multUocus sequence typing analyses revealed that 
the strain is not closely related to other strains isolated in Thailand 
or other countries. The sequences of MSG housekeeping genes 
reveal that it is most closely related to V. cholerae 0 1 strains isolated 
in the U. S. Gulf Coast area. The U. S. Gulf Coast clone [8,9] is 
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genetically distinct from several pathogenic clones of V. cholerae Ol 
[10], which caused only sporadic disease or small outbreaks, with 
no transmission spread along the Gulf Coast [11]. Nevertheless, 
two of 15 housekeeping genes of MSB [malP and pepN], are 
minimally related to those of U. S. Gulf Coast strains and 
represent novel sequences according to nucleotide sequence 
comparisons using BLAST. 

Here, we report the characterization of entire genome of V. 
cholerae Ol El Tor strain MS6. The results of these analyses 
enhance our understanding of the evolution and genetic basis of 
the pathogenicity of V. cholerae. 

Materials and Methods 

Ethics Statement 

The patient's consent was not required by the hospital, because 
the isolation of V. cholerae was performed as part of clinical 
management during hospitalization. To protect the privacy of the 
patient and the patient's family, all identifying information was 
excluded from this study. 

Strains, Growth Conditions, and DNA Isolation 

V. cholerae Ol El Tor serotype Ogawa strain MSB was isolated 
from a Myanmanese inpatient suffering from diarrhea who was 
treated at a hospital located in a Thai-Myanmar border city [12]. 
MSB was grown in Tryptic Soy Broth (Difco, Detroit, MI) at 37°C 
for 18 h with shaking. Cells were collected by centrifugation, and 
genomic DNA was extracted using proteinase K and phenol/ 
chloroform, treated with RNase, and purified. 

Genome Sequencing, Assembly, and Annotation 

The genome of MSB was sequenced using the Roche GS FLX 
Titanium system (8-kb-span paired-end library). Newbler (version 



2.B; 454 Life Sciences/Roche, Branford, CT) was used to generate 
and assemble 395,285 reads into two scaffolds (2.95 Mb and 
1.11 Mb) comprising BB contigs and 53 stand-alone contigs & 
500 bp with an average read depth of 24.5. The gaps between 
contigs were closed using the unassembled mate-paired reads, 
PGR sequencing, or both of amplicons generated using primers 
flanking the gaps. Further, lUumina sequence data (14.5 Gbases, 
100-bp paired-end reads) was used to improve low quality regions 
using GenomeTraveler (In SUico Biology, Inc., Yokohama, Japan). 
The whole genome sequence of MSB was deposited in the DDBJ 
(AP014524/AP014525). 

The RAST server (ver. 4.0) [13] automatically generated 
annotations of the 2B reference and MSB genomes. Accession 
numbers for complete and draft genome sequences are as follows: 
CPOO 1 233. 1 /CPOO 1234.1, AE003852. 1 / AE003853. 1 , 

CP0030B9.1/CP003070.1, ACHXOOOOOOOO, NC_012BB7/ 



ACVWOOOOOOOO, 
ACIAOOOOOOOO, 
AAUS02000000, 
ADAIOOOOOOOO, 
AAKI02000000, 
AATYO 1000000, 

AAKH03000000, 



NC_012B68, ACHZOOOOOOOO, 
AAKF03000000, AAUTO 1000000, 

ACFQPOOOOOOO, ACHWOOOOOOOO, 
AAWDO 1000000, CP000B26/CP000627, 
AAUROOOOOOOO, AAWFO 1000000, 

AAKJ02000000, AAUUO 1 000000, 
ACHVOOOOOOOO, ACHYOOOOOOOO, 
AAWGOOOOOOOO. Annotations were compared using the SEED 
viewer (ver. 4.0) [14] and in silico Molecular Cloning Genomics 
Edition (IMC-GE, ver. 5.2.BD; In Silico Biology, Inc.). Using the 
Clusters of Orthologous Groups (COG) database [15], we assigned 
functions to each protein family encoded by the MSB genome. 
Annotation of the MSB genome encoding CTX phage sequences 
and qnr were manually curated to incorporate evidence from 
published articles and public databases. 




COG functional categories 

I C. Energy ptoducrion and convenion 

I D, Cell division and chromosome partitioning 

I E, Amino acid transport and metabolism 

0 F, Nucleotide transport and metabolism 

1 G. Carbohydrate transpon and metabolism 
1 11, Coenzyme melatnlism 

1, Lipid rnelabolism 
I J, Translation, ribosomal strucmre and biogenesis 

K, Transcription 
H L, DNA replicalion, recombmation and repair 
I M, Cell envelope bbgenesis, outer membrane 

N, Ceil motility and secretion 
I 0, Postlrarslational nrodificalion, protein turnover, cbaperones 
I P, Inorganic ion tmnsport and metabolism 
I Q, Secontlary metabolism biosyndresis, tmnsport and cat^lbm 

R, General fuucliun prediction only 
I 5. Functbi! unknown 
I T, Signal transduction mechanisms 
I Not Assigned 

Blast classiflcation 

0,0 < e-value <1 ,0E-10O, Overlap > -JOyo 

■ Identity = 1 00 % ■ Identitj- > 0 % 

■ Identity > 99,9 % ■ Identity > 95,0 % 

■ Identity > 99,8 % Identitj' > 90,0 % 

■ Identity > 99,5 % ■ Idtintity > 80,0 % 

■ Identity > 99.0 % ■ Identity > 70,0 % 

■ No hit 



Figure 1 . Comparison of the large and small chromosomes of Vibrio cfiolerae Ol El Tor iVIS6 and those of three reference strains. The 

first and second outermost circles of each chromosome show the COG functional categories of the MS6 coding regions, in the clockwise and 
anticlockwise directions, respectively. The next three circles compare the coding regions of V. cholerae 01 IV166-2, N16961, and 2010EL-1786 with 
those of MS6. The sixth and seventh circles show the GC content of the IV1S6 sequence and the percent G+C deviation by strand, respectively. 
doi:1 0.1 371/journal.pone.00981 20.g001 
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Figure 2. Maximum-likelihood tree showing the phylogenetic relationships among MS6 and 26 Vibrio cholerae strains representing 
diverse serogroups. The tree was rooted by treating the VL426 strain as an outlier. Bootstrap supports (%) are indicated at the branching points. 
Branch lengths are proportional to the sequence differences. 
doi:1 0.1 371 /journal.pone.00981 20.g002 



Phylogenetic Analyses 

Coding sequences (CDSs) present as a single copy in 27 
genomes were analyzed using the pan-genomes analysis pipeline 
(PGAP) 1.02 [16] with the default parameters. CDSs of the same 
length (including gaps) were aligned after using a MAFFT with L- 
INS-I strategy [17]. Further, we chose CDSs with a low 
probability of recombination based on the PHI-test (cutoff value: 
p-value SO. 05) in SplitsTree4 [18]. Subsets of predicted amino 
acid sequences of each strain were concatenated, and maximum 
likelihood analyses were conducted with 100 bootstrap replicates 
by using the Randomized Axelerated Maximum Likelihood 
(RAxML) program [19]. The results were visualized using 
Dendroscope 3 [20]. 

Design of Primers and Conditions for Detection of the V. 
cholerae Molecular Marl<ers metY and hchA/luxR 

PCR primers were designed to amplify metT or hchA/luxR by 
using the sequences of the 27 V. cholerae genomes. The PCR 
reactions employed two forward primers {metT-F, 5'- 
GCGTGAAACCGGAGATGATCC-3' and luxR-F, 5'-TAGCT- 
CACCGCGAGCTCGTTG-3') and one reverse primer {Ijs-R, 5'- 
AGCGCAGAAGGTGTTACGCCA-3'). The theoretical amph- 
con lengths for metT and hchA/luxR are 353 bp and 521 bp, 
respectively. All amplification mixtures consisted of template 
DNA, 0.2 |a,M of each primer, 200 \iM of each deoxynucleoside 
triphosphate, and 0.025 U/|a,l of Ex Taq polymerase in the buffer 
supplied with the enzyme. After an initial denaturation step of 
94°C for 5 min, the reaction was conducted using 30 cycles each 
of 94°C for 30 s, 59°C for 30 s, and 72°C for 30 s. PCR products 



were analyzed by electrophoresing the products on 1.5% agarose 
gels, and the amplicons were detected using ethidium bromide. 

Results and Discussion 

Comparison of the Genomes of V. cholerae 01 El Tor IVlSe 
and Reference Strains 

The V. cholerae MS6 genome consists of circular chromosomes 1 
and 2, comprising 2,936,971 bp and 1,093,973 bp with average 
G+C contents of 47.7% and 46.8%, respectively. RAST annota- 
tion analysis of the MS6 genome predicted 3,746 predicted open 
reading frames (ORFs). The nucleotide sequences of the genes 
encoding the components of the polysaccharide [wav cluster) and 
O antigen [wbe gene cluster) biosynthetic pathways [21] were 
highly similar to those of other 0 1 El Tor strains, indicating that 
other organisms were not likely their source. 

We next compared the genome sequences of MS6 with those of 
the prototype seventh pandemic El Tor, N 16961 [22], seventh 
pandemic atypical El Tor, 2010EL-1786 [23], and the pre-seventh 
pandemic El Tor, M66-2 [24] strains deposited in the EMBL/ 
GenBank/DDBJ databases. We identified 3,420 core ORFs based 
on comprehensive orthologous gene detection using reciprocal 
comparison. ORFs of chromosome 1 were shared more frequently 
with those of the four test strains compared with chromosome 2. 
The sequence of the superintegron (SI), which functions as a gene 
capture system, varied considerably among the strains (Fig. 1). The 
gene order of the common ORFs of each chromosome (i.e., 
synteny) was well conserved, except for a 1 84-kb inversion near the 
replication origin of chromosome 1 in strains MS6 and 2010EL- 
1786 compared with the other two strains. 
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Figure 3. MS6, but not U. S. Gulf Coast strain 2740-80, carries the Vibrio seventh pandemic island-1 (VSP-1) between VC0174 and 

VC0186. Dendrograms were constructed based on the genes flanking VSP-1 (VC0T74 and VC0186) using the neighbor-joining method using MEGA 
version 5.2 [37], VSP-1 was identified in the eight V. cholerae strains shown in red. IV1S6 as well as the seventh pandemic strains carry the full VSP-1 
sequence between VC0174 and VC0186, which is closely related to that of strain 2740-80. Scale bars indicate nucleotide substitutions per site. 
doi:1 0.1 371/journal.pone.00981 20.g003 



The sequences of the chromosomes 1 and 2 ORFs were 
compared with the genomes of three V. cholerae Ol reference 
strains. The shared ORFs were classified according to the 
percentage identity of the DNA sequences (Fig. 1, outer rings 3- 
5). Blocks of ORFs (green represents 95% and 99% identities) were 
recognized in the chromosomes of each strain. Further, some of 
the ORFs were highly conserved only in the pre-seventh pandemic 
strain M66-2 or seventh pandemic strains N 16961 and 2010EL- 
1786. Therefore, the MS6 genome exhibits a mosaic structure, 
which was likely generated by homologous recombination with 
other V. cholerae chromosomes. Sixteen ORFs and 44 ORFs on the 
large and small chromosomes, respectively, of MS6 were not 
detected in the genomes of the three reference strains. Notably, 51 
of these 60 ORFs are encoded by mobile genetic elements or the 
SI. 

The MS6 ORFs included in the 18 COG categories were 
compared with the complete genome sequences of the three V. 
cholerae Ol strains. The number of ORFs in these three strains that 
were identical to those of MS6 was examined in proportion to the 
total number of ORFs in each category (results not shown). The 
average percentage of amino acid and nucleotide sequence 
matches among chromosome 1 of all categories were 79% and 
69%, respectively; however, similarities of ORFs on chromosome 
2 were approximately 10% lower than those of chromosome 1. We 
compared the COG-categorized ORFs of MS6 with those of 26 
strains of V. cholerae (Fig. SI). Only the ORFs of V. cholerae Ol were 
highly related to those of MS6. 



The relationships among the 27 strains were further investigated 
using genome-wide phylogenetic analysis (Fig. 2). AH V. cholerae Ol 
strains except two comprised a clade (PG clade), and 16 strains of 
the PG clade were further divided into two subclades [25]. The 
PG- 1 subclade comprises most of the V. cholerae 0 1 El Tor strains 
and MO 10 (0139), whereas the PG-2 subclade includes two 
classical strains and VC52 (037). MS6 is most closely related to 
the U. S. Gulf Coast strain 2740-80, indicating that these 
organisms diverged before the phylogenetic separation of the El 
Tor biotype strains from a common V. cholerae 0 1 ancestor. 

Significant Features of the M56 Genome 

Two tandem copies of CTX prophages are present at the dimer 
resolution site [dij) of MS6 chromosome 1 (Fig. S2). The toxin- 
linked cryptic element is present within the difl region of MS6, 
indicating that these elements likely integrated into the host 
chromosome through XerC/XerD-mediated recombination [26- 
28]. No CTX prophage was detected at MS6 diJ2. The 6.9-kb 
CTX(p genome includes a DNA replication module designated 
repeat sequence (RS) 2, which comprises rstR, rstA, rstB and a core 
region comprising psh, cep, glll (orfU), ace, lot, and ctxAB [29] . The 
sequences of these MS6 and Ol El Tor strain N 16961 genes are 
identical. 

However, the intergenic region- 1 (ig-1) located near rstR and the 
toxboxes required for activating transcription from the cholera 
toxin promoter (Pcivlfi) [30,31] differed between these strains. 
Specifically, MS6 possesses three perfect direct repeats 
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Figure 4. Distribution of open reading frames (ORFs) of tlie MS6 SI among the 26 reference strains. The 279 ORFs (MS6_A0273 to 
MS6_A0551) in the 144-l<b SI of IV1S6 were compared with the genomes of 26 V. cholerae strains and were classified based on the percentage amino 
acid sequence identity, and colored accordingly. The ORFs of IV1S6 are highly similar to those of the U. S. Gulf Coast strain 2740-80. 
doi:1 0.1 371 /journal.pone.00981 20.g004 



(TTTTGAT) within PctxAB as well as strains MAK757, MO 10, 
V52, RC9, and CIRSIOl, and strain NI6961 harbors four. The 
ig-I region in MSG is longer than that of N 1 6961. Annotation 
using IMC-GE predicted that each ig-1 region of the CTX 
prophage encodes a protein (CTXUG-1) composed of 91 amino 
acid residues. Further, although MSG lacks an RSI region 
(consisting of rstR, rstA, rstB, and rstC), which is usually associated 
with CTX(p in V. cholerae Ol El Tor and 0139 isolates [32], MSG 
possesses a genomic island designated MSGCTXAGI that encodes 
a similar rstC sequence (two amino acid residue diflFerences 
compared with N16961), four ORFs of unknown function, a 
putative transcriptional regulator, and a CTXUG-1 homologue. 

Toxin coregulated pHus, which acts as a receptor for phage 
CTX, is essential for colonizing infant mice as well as humans in 
model systems and is encoded by sequences within a 45-kb 
pathogenicity island (VPTl) [33]. The Vibrio pathogenicity island- 
2 (VPT2; VC1758-VC1809; 57.3 kb) [34] that encodes neur- 
aminidase and proteins involved in sialic acid metabolism is 
present in MS6. VPTl and VPI-2 regions in MSG are highly 
related to those of the other V. cholerae Ol strains. However, the 
Vibrio seventh pandemic island-1 (VSP-1; VC0175-0185; 14 kb) 
[32,35,36] and Vibtio seventh pandemic island-2 (VSP-2; 
VC0490-VC051G; 27 kb) differ between MSG and seventh 
pandemic strains. The MSG genome lacks VSP-2, and although 
the entire VSP- 1 region is present, its flanking genes VCOl 74 and 



VC0186 are distantly related to those of seventh pandemic strains 
(Fig. 3). The dendrograms based on their sequences indicate that 
they were closely related to those of 2740-80. 

We identified a novel 4.7-kb mobile genetic element designated 
Ml in MSG, which is integrated into the spacer region between 
rpmF and maf on the large chromosome (Fig. S3). MSG-MI 
comprises six ORFs (MSG_17B4 to MSG_1789), including a 
putative integrase. The outer membrane protein of MSG is 
encoded by ompW, which is split by the transposon (Fig. S4). In 
contrast, 1 1 bp oi ompW is deleted in strain 2740-80. This gene is 
conserved among V. cholerae strains and is utilized for identification 
of V. cholerae strains [38]. Although the biological role of ompWis 
unknown, its function may be linked to the adaptive response to 
stress [39]. Among the 27 strains studied here, the integral 
membrane protein MSG_A0359 with four transmembrane-span- 
ning helices (motif HPP) [40] is present in strains MSG, 2740-80, 
and R385. Moreover, the putative RNA-binding protein 
MS6_A0295 is present only in strains MSG and 2740-80 in the 
PG clade, whereas an MSG_A0295 homologue is present in 10 of 
1 1 strains in distinct phyletic lineages. 

The SI of MS6 is a Massive Gene Capture System 

The SI of V. cholerae Ol strain N1G9G1 is a 127-kb integron 
island that resides on the small chromosome (VCA0291- 
VCA0508) [22]. Most genes predicted to reside within the SI 
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Figure 5. Vibrio cholerae 01 genomes can be divided into two clusters that possess either fichA/luxR ot metY'in a conserved syntenic 
region of the small chromosome. Dendrograms were constructed based on hchA/luxR or metY using the method described in the legend to Fig. 
3. Strains highlighted in blue belong to serogroup 01. However, four strains enclosed in the purple square may have undergone lateral gene 
exchange of 0-antigen gene clusters; thus, strains V52 and MO10 were converted into 037 and 0139 serogroups, respectively, while strains 12129(1) 
and TM1 1079-80 gained the 01-antigen gene cluster [25]. 
doi:1 0.1 371 /journal.pone.00981 20.g005 



encode hypothetical proteins, and the SI may serve as a source of 
genetic variation. Strains 2010EL-I786 and M66-2 harbor 
approximately 100-kb SI regions compared with the 144-kb SI 
of MSG. The RAST server automatically annotated 46 ORFs in 
category R of the SI that represent the death-on-curing family of 
toxin proteins, which contain the well-conserved central motif 
HxFxiND] [AG]NKR [41]. We detected 39, four, and two ORFs 
of this family, respectively, in N 16961, M66-2, and 2010EL-1786, 
and we therefore suggest that toxin-antitoxin modules plays a role 
in maintaining the large SI in MSG. All predicted ORFs of the SI 
of MS6 were compared with those of 26 reference V. cholerae 
genomes (Fig. 4). ORFs identical or similar to those of MSG were 
present in the genomes of strains 2740-80, BX33028G, and 
NCTC8457. In contrast, there was little similarity between the 
MSG ORFs and those of most of the non-01/non-0139 strains. 



The similar organization of ORFs within their SI domains suggests 
a close relationship between MSG and strain 2740-80. 

A qnr Cassette is Present in the SI of iVlS6 

On the small chromsome of MSG, qnr is located approximately 
28 kb from the SI integrase (IntI4). The qnr cassette was not 
detected in the chromosomes of other V. cholerae strains [42] . The 
MSG ^nrnucleotide sequence is 99% (650/657) identical to that of 
qnrVC4, which is a novel complex class 1 integron harboring the 
ISCRl element in an aquatic isolate of Aeromonas punctata [43]. 
Moreover, the gene cassette in the class 1 integron harbored attC 
and a 214-bp noncoding sequence, which were a nearly perfect 
match to the gene cassette of MSG (Fig. S5). This finding provides 
evidence that this gene cassette was mobilized from the SI into a 
plasmid-borne integron through class 1 integrase activity [44], 
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Figure 6. Hypothetical evolutionary relationship among clades of Vibrio ciio/erae with reference to strain MS6. Hypothetical ancestral 
Vibrio organisms are indicated by open circles. Although V. cholerae 01 possesses the partially overlapping tichA/luxR, they are replaced by metY in 
strains IV1S6 and 2740-80. 
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leading to the transmission of the resistance integron to several 
gram-negative bacteria. 

MetY is Present in IVlSe and U. S. Gulf Coast Strains 

AU V. cholerae strains carry either hchA/luxR or metY between 
MS6_A0926 and MS6_A0928 on the small chromosome (Fig. 5). 
The PG clade, except for strains 2740-80 and MSB, harbors hchA/ 
luxR, and metT is present only in MSG and strain 2740-80. 
Although strains VL426, MZO-2, MZO-3, R385, 1587, and 
12129(1) belong to distinct phyletic lineages, they harbor hdiA/ 
luxR. Moreover, the synteny of the surrounding regions, DNA 
sequences, or both are distinguishable from those of the Ol 
lineage. 

The 122 isolates of V. cholerae Ol from a variety of sources 
harbor hthA/luxR but not metY. In contrast, hchA/luxR is present in 
47 of 64 non-01/non-0139 V. cholerae isolates from clinical and 
environmental sources, and metY is present in the remaining 1 7 
strains. Further, the sequences oi hchA/luxR of 40 selected isolates 
of V. cholerae Ol El Tor are identical, including V. cholerae Ol El 
Tor strains N16961 and MAK757 (Fig. 5). These findings suggest 
that strain 2740-80 and MSG reverted to metY irom hchA/luxR in 
the distant past (Fig. S6, Fig. 6). Sequence comparisons of stains of 
the closely related species V. mimicus detected metY hut not hchA/ 
luxR in the genomes of strains VM573, SX-4, VM603, MB-451, 
and VM223. 

The amino acid sequence of MS6_A0927 was 52% identical 
(68% similar) to the product (O-acetyl-homoserine sulfhydrylase; 
EC 2.5.1.49) ot metY in Leptospira meyeri [46]. The predicted amino 
acid sequence of Escherichia coli hchA, which encodes heat shock 
protein 31 (Hsp31) [47,48], is 60% identical (similarity, 78%) to 
that of V. cholerae Ol. Moreover, evidence indicates that Hsp31 
contributes to the resistance to acid of stationary-phase E. coli [49]. 
Acid tolerance represents a significant factor in the epidemic 
spread and virulence of V. cholerae [50]. 

Conclusions 

The analysis of the complete genome of MS6, which is distandy 
related to pathogenic Ol El Tor strains of V. cholerae, contributes 
insights into the evolution of the V. cholerae 0 1 serogroup as well as 



others. Our approaches demonstrates that chromosomes 1 and 2 
of MS6 were frequendy modified by horizontal gene transfer from 
other Vibrio species after divergence from a common ancestor of 
the PG-1 subclade and MS6. The genomic features of MS6 are 
most similar to those of U. S. Gulf Coast .strain 2740-80. 

Supporting information 

Figure SI Clusters of orthologous group classes of the 
open reading frames (ORFs) of V. cholerae MS6 are 
similar to those of V. cholerae 0 1 strains but not to those 

of the non-Ol serogroup. The 2,548 ORTs of the MS6 
genome were assigned to 18 COG functional categories, and the 
numbers of ORFs are shown in parentheses in the right column. 
The percentage identities among the ORFs were calculated by 
dividing the number of identical ORFs of each reference strain 
genome by the number of MS6 ORFs of each functional category. 
Strains highlighted in blue and red belong to serogroups 0 1 and 
non-Ol, respectively. We suggest that the four strains enclosed in 
the purple square underwent lateral gene exchange of O-antigen 
gene clusters (Fig. 5). 
(EPS) 

Figure S2 The MS6 GTX prophage and adjacent genetic 
elements. The MS6 genome harbors two CTX prophage 
sequences between MS6_1229 and MS6_1252. The genomic 
island MS6CTXAGI ( CTX phage-Associated Genomic Island), 
which comprises seven open reading frames (ORFs) as well as one 
encoding an RstC homolog, resides between rtxA and the CTX 
prophage. The lower panel shows the amino acid sequences of the 
three genes encoding CTXUG-1. 
(EPS) 

Figure S3 Genome structure of the mobile genetic 
element MS6-M1. MS6-M1 resides in the spacer region 
between rpmF (green) and maf (blue) on the large chromosome 
(Chi) and harbors 5 ORFs (orange) and a putative integrase (red). 
The direct repeats (arrowheads) are separated by 4.7 kb. MS6- 
Ml-like structures were detected only in the superintegTon (SI) 
region of the classical stains 0395 and R27, although these two 
regions are interrupted by the transposon orfAB. The percentage 
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nucleotide sequence identities among the integrase and the four 

ORFs are 73.2% and 89.5%, respectively. 

(EPS) 

Figure S4 A transposable element disrupts MSG ompW. 

An outer membrane protein of strain N 16961 is encoded by ompW 
(VCA0867, 654 bp). The homologous gene of MSG is interrupted 
by an insertion of 1.2 kb encoding a transposase, and 1 1 bp were 
deleted from the corresponding allele of strain 2740-80. 
(EPS) 

Figure S5 The superintegron (SI) of MSG harbors a 
quinolone-resistance gene cassette (qnrVC4) present in 
the class 1 integron of Aeromonas punctata 159. Th(- 
qnrVC4 cassette of MS6 is located between V. chokrae repeats 
(VCRs are indicated in italics in the DNA sequence). VCR 
represents the attachment C site {attC) associated with the captured 
cassette in the SI of V. cholerae. VCR and a noncoding region 
corresponding to the gene- cass(;tt<; of the class 1 integron are 
linked to qnrVC4. The sequences of the latter two elements are 
97% identical (988/1021). 
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